feat: support openai responses api #3

alexf37 · 2025-08-25T16:03:22Z

IMPORTANT: I am publishing this as a release candidate rather than a proper release because I want to be able to dogfood it in our own API before we release it generally. The test coverage is pretty good, but I want to make sure that this is actually the interface we want. Automated tests can only test logic, not ergonomics.

Overview

With this PR, LLM Bridge supports the Responses API from OpenAI. This API differs from the completions APIs that we've been dealing with so far in that it is, or at least can be, a stateful API, and it manages that state per se by passing in some extra provider-specific parameters. For simplicity's sake, we remain unopinionated about the statefulness of the requests being made in LLM Bridge. We treat the messages that are passed in as those messages and we don't try to append state or hook into the state management portion of this API in any way. To do so is a bit outside of the scope of this library and can be trivially done in one's own implementation. We pass through Responses state hints like store and previous_response_id, so statefulness in the actual API requests is not broken.

TL;DR

Auto shape selection: Calls to /v1/responses emit a Responses body; other OpenAI endpoints emit Chat bodies by default.
Manual override: Force Responses emission via provider_params.openai_target = "responses" before translating back to provider shape.
State hints pass-through: store, previous_response_id, include, text, parallel_tool_calls, service_tier, truncation, background, user, and metadata are preserved.
Tools:
- Custom function tools map to/from universal.tools (JSON Schema based).
- Built-ins (e.g., web_search_preview, file_search, code_interpreter) round‑trip via provider_params.responses_builtin_tools.
Token limits: Universal max_tokens maps to Responses max_output_tokens when emitted.

What changed (high level)

The translator can parse a Responses‑shaped request (instructions, input[], etc.) into the universal shape and emit a valid Responses body when appropriate.
The handler detects /v1/responses in the target URL and automatically annotates the universal request so the OpenAI formatter emits a Responses body.
You can continue sending Chat requests; no changes required unless you want to opt‑in to Responses.

Before → After: Universal handler

// BEFORE: Chat Completions
import { handleUniversalRequest } from "llm-bridge"

async function editFunction(request: any) {
  return { request, contextModified: false }
}

const { response } = await handleUniversalRequest(
  "https://api.openai.com/v1/chat/completions",
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
    stream: false,
  },
  { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  "POST",
  editFunction,
)

// AFTER: Responses API (auto-emitted when URL is /v1/responses)
import { handleUniversalRequest } from "llm-bridge"

async function editFunction(request: any) {
  // Optional: pass through Responses state hints via provider_params
  return {
    request: {
      ...request,
      provider_params: {
        ...(request.provider_params ?? {}),
        store: true,
        previous_response_id: "resp_123",
      },
    },
    contextModified: false,
  }
}

const { response } = await handleUniversalRequest(
  "https://api.openai.com/v1/responses",
  {
    model: "gpt-4o",
    // Either `instructions` + `input[]` (Responses shape) or `messages[]` (Chat) is accepted.
    // If the URL is /v1/responses, LLM Bridge will emit a Responses body.
    input: [
      {
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Hello" }],
      },
    ],
    stream: false,
    store: true, // preserved via provider_params
  },
  { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  "POST",
  editFunction,
)

Key differences:

Shape: Responses uses instructions and input[] parts; Chat uses messages[].
State: store and previous_response_id are passed through on provider_params and preserved.
Tokens: max_tokens (Chat) maps to max_output_tokens (Responses) when emitted.
Built-in tools: Responses built-ins round‑trip via provider_params.responses_builtin_tools.

Before → After: Direct translators

// BEFORE: Chat -> universal -> Chat
import { toUniversal, fromUniversal } from "llm-bridge"

const chatReq = {
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hi" }],
}
const universal = toUniversal("openai", chatReq)
const backToChat = fromUniversal("openai", universal) // emits Chat body

// AFTER: Responses -> universal -> Responses (forced)
import { toUniversal, fromUniversal } from "llm-bridge"

const responsesReq = {
  model: "gpt-4o",
  instructions: "You are helpful.",
  input: [
    {
      type: "message",
      role: "user",
      content: [{ type: "input_text", text: "Hi" }],
    },
  ],
  store: true,
}
const universal = toUniversal("openai", responsesReq)

const emittedResponses = fromUniversal("openai", {
  ...universal,
  provider_params: {
    ...(universal.provider_params ?? {}),
    openai_target: "responses", // force Responses emission if not calling /v1/responses
  },
})

Tools: function tools vs built-ins

// Function tools (both Chat and Responses)
const universalWithFunctionTool = {
  ...universal,
  tools: [
    {
      name: "get_weather",
      description: "Get weather",
      parameters: { type: "object", properties: { city: { type: "string" } } },
    },
  ],
}

// Built-in Responses tools (preserved in provider_params)
const withBuiltin = {
  ...universal,
  provider_params: {
    ...(universal.provider_params ?? {}),
    responses_builtin_tools: [{ type: "web_search_preview" }],
  },
}

Streaming and token limits

Chat: stream, max_tokens
Responses: stream, max_output_tokens (mapped from universal max_tokens)

Migration tips

Keep the same business logic. The handler stays stateless and accepts both Chat and Responses‑shaped inputs.
To adopt Responses, call the universal handler against /v1/responses (auto‑emits Responses). To force Responses shape elsewhere, set provider_params.openai_target = "responses" prior to translation.
For multimodal input under Responses, use input parts such as { type: "input_text" }, { type: "input_image" }. These map to universal messages with text/image content.

References

Detailed mapping and examples: docs/openai-responses.md
OpenAI docs: https://platform.openai.com/docs/api-reference/responses

Dhravya · 2025-10-31T16:44:43Z

@nexxeln check this

alexf37 added 7 commits August 25, 2025 02:05

feat: support responses api

35863ef

improve

e67d816

fix types

81c61d0

seamless response format translation

fd33b48

fix example type errors

ad0aeff

update readmes

f747469

bump version

fe51593

nexxeln approved these changes Nov 1, 2025

View reviewed changes

nexxeln added 2 commits November 1, 2025 20:53

fix merge conflict

2ddee94

Merge branch 'main' into alex/responses-api-v2

6c97b71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support openai responses api #3

feat: support openai responses api #3

alexf37 commented Aug 25, 2025

Uh oh!

Dhravya commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support openai responses api #3

Are you sure you want to change the base?

feat: support openai responses api #3

Conversation

alexf37 commented Aug 25, 2025

Overview

TL;DR

What changed (high level)

Before → After: Universal handler

Before → After: Direct translators

Tools: function tools vs built-ins

Streaming and token limits

Migration tips

References

Uh oh!

Dhravya commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants